Using syntactic information in handling natural language queries forextended boolean retrieval model

نویسندگان

  • Geunbae Lee
  • Mihwa Park
چکیده

There are considerable evidences that trained users can achieve a good search eeectiveness through structured boolean queries rather than simple keyword queries because boolean operators can help to make more accurate representations of users' information search needs. However, it is not normally easy for ordinary users to construct eeective boolean queries using appropriate boolean operators. In this paper , we propose a syntax-based technique for handling natural language queris and phrases for extended boolean retrieval model in order to pursue both search eeectiveness and user convenience. First, natural language queries are syntactically analyzed using Korean natural language parser and the resulting syntactic trees are structurally simpliied using tree-simplifying mechanism in order to catch the logical relationships between keywords. Secondly , in a simpliied tree, plausible noun phrases are identiied and added into the tree as new additional keywords for more precise retrieval. Finally, the tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an n-best tree method which uses top n syntactic trees to compensate for detrimental eeects of a single incorrect top syntactic tree. In the experiments using KTSET2.0 (Korean standard document set), we showed that the proposed method outperformed natural language models without any syntactic analysis by 23% and, surprisingly enough, outperformed even manually constructed boolean queries by 8% in the 11-point average precision measures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Freestyle vs. Boolean: A Comparison of Partial and Exact Match Retrieval Systems

-Although Boolean searching has been the standard model for commercial information retrieval systems for the past three decades, natural language input and partial-match weighted retrieval have recently emerged from the laboratories to become a searching option in several well-known online systems. The purpose of this investigation is to compare the performance of one of these partial match opt...

متن کامل

Natural Language Processing and XML Retrieval

XML information retrieval (XML-IR) systems respond to user queries with results more specific than documents. XML-IR queries contain both content and structural requirements traditionally expressed in a formal language. However, an intuitive alternative is natural language queries (NLQs). Here, we discuss three approaches for handling NLQs in an XMLIR system that are comparable to, and even out...

متن کامل

Public Transport Ontology for Passenger Information Retrieval

Passenger information aims at improving the user-friendliness of public transport systems while influencing passenger route choices to satisfy transit user’s travel requirements. The integration of transit information from multiple agencies is a major challenge in implementation of multi-modal passenger information systems. The problem of information sharing is further compounded by the multi-l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999